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Abstract 

We obtain a 1.5-approximation algorithm for the metric uncapac- 
itated facility location problem (UFL), which improves on the previ- 
ously best known 1.52-approximation algorithm by Mahdian, Ye and 
Zhang. Note, that the approximability lower bound by Guha and 
Khuller is 1.463.. 

An algorithm is a (Xt ,X C )- approximation algorithm if the solution 
it produces has total cost at most A/ • F* + A c • C* , where F* and C* 
are the facility and the connection cost of an optimal solution. Our 
new algorithm, which is a modification of the (1 + 2/e)-approximation 
algorithm of Chudak and Shmoys, is a (1.6774,1.3738)-approximation 
algorithm for the UFL problem and is the first one that touches the ap- 
proximability limit curve (7/, 1 + 2e f f) established by Jain, Mahdian 
and Saberi. As a consequence, we obtain the first optimal approxi- 
mation algorithm for instances dominated by connection costs. When 
combined with a (l.ll,1.7764)-approximation algorithm proposed by 
Jain et al., and later analyzed by Mahdian et al., we obtain the overall 
approximation guarantee of 1.5 for the metric UFL problem. We also 
describe how to use our algorithm to improve the approximation ratio 
for the 3-level version of UFL. 
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1 Introduction 



1.1 Background on Uncapacitated Facility Location 

The Uncapacitated Facility Location (UFL) problem is defined as follows. 
We are given a set J- of facilities and a set C of clients. For every facility 
i G there is a nonnegative number /j denoting the opening cost of the 
facility. Furthermore, for every client j G C and facility i £ f , there is a 
connection cost Cjj between facility z and client j. The goal is to open a 
subset of the facilities J-' C J 7 , and connect each client to an open facility 
so that the total cost is minimized. The UFL problem is NP-complete, and 
max SNP-hard (see |13j). A UFL instance is metric if its connection cost 
function satisfies the following variant of the triangle inequality: 

(Hj < Ciji + Ci'ji + Ci'j for any G C and G T . (1) 

We will say that an algorithm is a A-approximation algorithm for a min- 
imization problem if it computes, in polynomial time, a solution that is at 
most A times more expensive than the optimal solution. Specifically, for 
the UFL problem we consider the notion of bifactor approximation intro- 
duced by Charikar and Guha 0|8]. We say that an algorithm is a (A/,A C )- 
approximation algorithm if the solution it delivers has total cost at most 
Xj ■ F* + A c • C*, where F* and C* denote, respectively, the facility and 
the connection cost of an optimal solution. Note the potential ambiguity 
resulting from the possible existence of multiple optimal solutions. When 
presenting our algorithm, we will compare the solution cost only to the cost 
of the initial fractional solution. Nevertheless, as we observe at the end of 
Section adding an additional scaling step to our algorithm is sufficient to 
get a guarantee in a comparison with any feasible fractional solution. 

Guha and Khuller [13] proved by a reduction from Set Cover that there is 
no polynomial time A-approximation algorithm for the metric UFL problem 
with A < 1.463, unless NP C DTIME(n loglogn ). Sviridenko showed that 
the approximation lower bound of 1.463 holds, unless P = NP (see |23j). 
Jain et al. [16] generalized the argument of Guha and Khuller to show that 
the existence of a (Aj,A c )-approximation algorithm with A c < 1 + 2e~ x f 
would imply NP C DTIME(n log lo s n ) . 

The UFL problem has a rich history starting in the 1960's. The first 
results on approximation algorithms are due to Cornuejols, Fisher, and 
Nemhauser [TT] who considered the problem with an objective function of 
maximizing the "profit" of connecting clients to facilities minus the cost of 
opening facilities. They showed that a greedy algorithm gives an approxi- 
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mation ratio of (1 — 1/e) = 0.632 . . . , where e is the base of the natural log- 
arithm. This ratio was later improved to 0.828 by Ageev and Sviridenko [2]. 

For the objective function of minimizing the sum of connection cost 
and opening cost, Hochbaum [15] presented a greedy algorithm with an 
0(log n) approximation guarantee, where n is the number of clients. By 
a straightforward reduction from the Set Cover problem, it can be shown 
that this cannot be improved unless NP C DTIME[n°( l ° slogn) ] due to 
a result by Feige [12]. However, if the connection costs are restricted to 
satisfy the triangle inequality ([I]), then constant approximation guarantees 
can be obtained. In all results mentioned below, except for the maximization 
objectives, it is assumed that the costs satisfy these restrictions. If the 
distances between facilities and clients are Euclidean, then for some location 
problems approximation schemes have been obtained [4]. 

The first approximation algorithm with constant approximation ratio for 
the metric minimization problem was developed by Shmoys, Tardos, and 
Aardal [21]. Since then numerous improvements have been made. Guha 
and Khuller [131 E] introduced a greedy augmentation procedure (see also 
Charikar and Guha |8.j). A series of approximation algorithms based 
on LP-rounding was then developed (see e.g. [9j \TU[ 122]). There are also 
greedy algorithms that only use the LP-relaxation implicitly to obtain a 
lower bound for a primal-dual analysis. An example is the JMS 1.61- 
approximation algorithm developed by Jain, Mahdian, and Saberi [16]. Some 
algorithms combine several techniques, like the 1.52-approximation algo- 
rithm of Mahdian, Ye, and Zhang \18\ 119], which uses the JMS algorithm 
and the greedy augmentation procedure. Up to now, their approximation 
ratio of 1.52 was the best known. Many more algorithms have been consid- 
ered for the UFL problem and its variants. We refer the interested reader 
to survey papers by Shmoys [20J and Vygen [23J . 

1.2 Some basic techniques 

In several LP-based approximation algorithms a clustering step is part of an 
algorithm for creating a feasible solution, see Section 12.21 for more details. 
In this step a not yet clustered client is chosen as the so-called "cluster 
center" and one of the facilities that fractionally serves the cluster center 
in the LP solution is opened. Our main technique is to modify the support 
graph corresponding to the LP solution before clustering, and to use various 
average distances in the fractional solution to bound the cost of the obtained 
solution. 

A similar way of modifying the LP-solution, called filtering, was intro- 
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duced by Lin and Vitter [T7]. Lin and Vitter considered a broad class of 0-1 
problems having both covering and packing constraints. They start by solv- 
ing the LP-relaxation of the problem, and in the subsequent filtering step 
they select a subset of the variables that have positive value in the LP solu- 
tion and that have relatively large objective coefficients. These variables are 
set equal to zero, which results in a modified problem. The LP-relaxation of 
this modified problem is then solved and rounding is applied. In the paper 
by Shmoys et al. [21 J filtering was also used in order to bound the connec- 
tion costs. Here again a subset of the variables that have a positive value in 
the LP-solution are set equal to zero. The remaining positive variables were 
scaled so as to remain feasible for the original LP-relaxation. 

Later, Chudak [9] observed that the LP-relaxation was already filtered 
in a certain sense as it is possible to state that if a client is fractionally 
connected to a facility in the LP-solution, then one can bound the cost of this 
connection in terms of the optimal LP-dual variables. This observation was 
later used by Aardal, Chudak, and Shmoys [1] in their algorithm for multi- 
level problems, and by Sviridenko [22j . The filtering done in our algorithm 
is slightly different as the filtered LP-solution is not necessarily feasible with 
respect to the LP-relaxation. Throughout this paper we will use the name 
sparsening technique for the combination of filtering with our new analysis. 

1.3 Our contribution 

We modify the (1 + 2/e)-approximation algorithm of Chudak [9], see also 
Chudak and Shmoys [10] . to obtain a new (1.6774,1.3738)-approximation 
algorithm for the UFL problem. Our linear programming (LP) rounding 
algorithm is the first one that achieves an optimal bifactor approximation 
due to the matching lower bound of (A^, 1 + 2e~ x f ) established by Jain et 
al. [16]. In fact we obtain an algorithm for each point (A/, 1 + 2e~ Xf ) such 
that Af > 1.6774, which means that we have an optimal approximation 
algorithm for instances dominated by connection cost (see Figured]). 

One of the main technical contributions of the paper is the proof of 
Lemma 13.1] which gives a bound on the expected connection cost in the 
case of using a path via cluster center to connect a client. This lemma may 
potentially be useful in constructing new algorithms for UFL and related 
problems. 

One could view our contribution as an improved analysis of a minor mod- 
ification of the algorithm by Sviridenko |22j . which also introduces filtering 
to the algorithm of Chudak and Shmoys. The filtering process that is used 
both in our algorithm and in the algorithm by Sviridenko is relatively easy 
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Figure 1: Bifactor approximation picture. The gray area corresponds to the 
improvement due to our algorithm. 

to describe, but the analysis of the impact of this technique on the quality 
of the obtained solution is quite involved in each case. Therefore, we prefer 
to state our algorithm as an application of the sparsening technique to the 
algorithm of Chudak and Shmoys, which in our opinion is relatively easy do 
describe and analyze. 

We start by observing that for a certain class of instances the analysis 
of the algorithm of Chudak and Shmoys may be improved. We call these 
instances regular, and for the other instances we propose a measure of their 
irregularity. The goal of the sparsening technique is to explore the irreg- 
ularity of instances that are potentially tight for the original algorithm of 
Chudak and Shmoys. We cluster the given instance in the same way as in 
the 1.58-approximation algorithm by Sviridenko 22j, but we continue our 
algorithm in the spirit of Chudak and Shmoys' algorithm, and we use certain 
average distances to control the irregularities, which leads to an improved 
bifactor approximation guarantee. 

Our new algorithm may be combined with the (1.11, 1.7764)-approximation 
algorithm of Jain et al. to obtain a 1.5- approximation algorithm for the 
UFL problem. This is an improvement over the previously best known 1.52- 
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approximation algorithm of Mahdian et al., and it cuts of a 1/3 off the gap 
with the approximation lower bound by Guha and Khuller [13]. An earlier 
version of this paper appeared in [5]. 

We now give an informal sketch of our algorithm. Using this description 
we give an outline of the paper. 

Sketch of the algorithm. 

1. Solve the LP relaxation of the problem. 

2. Modify the fractional solution by: 

• scaling up the facility opening variables, 

• modifying the connection variables to completely use the "clos- 
est" fractionally open facilities, 

• splitting facilities, if necessary, such that there is no slack between 
the amount that a client is assigned to a facility, and the amount 
by which this facility is opened. 

3. Divide clients into clusters based on the current fractional solution. In 
each cluster a specific client is assigned to be a "cluster center" . 

4. For every cluster, open one of the "close" facilities of the cluster center. 

5. For each facility not considered above, open it independently with 
probability equal to the fractional opening. 

6. Connect each client to an open facility that is closest to it. 

In Section [2] we give a brief overview of the main ingredients of some 
known approximation algorithms for UFL. In particular we state the LP re- 
laxation of UFL, describe clustering, scaling, and greedy augmentation. The 
clustering technique is common for the existing LP-rounding algorithms for 
UFL, and it is applied in Steps [3] and 0] of the above algorithm. Sparsening 
of the support graph of the LP solution, which is the essence of Step [2j 
is discussed in Section [3l where we also prove the crucial lemma on cer- 
tain connection costs. A more detailed description of the algorithm and its 
analysis are presented in Section HJ and the 1.5- approximation algorithm is 
stated in Section In Section [6] we show that the new (1.6774,1.3738)- 
approximation algorithm may also be used to improve the approximation 
ratio for the 3-level version of the UFL problem to 2.492. A randomized 
approach to clustering is discussed in Section and, finally, in Section [8] we 
present some concluding remarks and open problems. 
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2 Preliminaries 



We will review the concept of LP-rounding algorithms for the metric UFL 
problem. These are algorithms that first solve the linear relaxation of a given 
integer programming (IP) formulation of the problem, and then round the 
fractional solution to produce an integer solution with a value not too much 
higher than the starting fractional solution. Since the optimal fractional 
solution is at most as expensive as an optimal integral solution, we obtain 
an estimation of the approximation factor. 

2.1 IP formulation and relaxation 

The UFL problem has a natural formulation as the following IP problem. 

min YlieT,jeC c ij°°ij + YlieT 

s.t. J2ieJ rX ij = 1 f° r an 3 e ^' 

%ij — Hi < for all i € J 7 , j £ C, 

Xij,yi G {0, 1} for all € € j G C . (2) 

A linear relaxation of this IP formulation is obtained by replacing the 
integrality constraints ([5]) by the constraint Xij > for all i € J 7 , j E C . 
The value of the solution to this LP relaxation will serve as a lower bound 
for the cost of the optimal solution. We will also make use of the following 
dual formulation of this LP. 



max Ejec V J 

s -t- /CjGC W ij - fi for a11 i G ^ 



"ij ^ ^ij for all i G T y j € C, 

Wij > for all i G "F, j G C . 



2.2 Clustering 

The first constant factor approximation algorithm for the metric UFL prob- 
lem by Shmoys et al., but also the algorithms by Chudak and Shmoys, and 
by Sviridenko are based on the following clustering procedure. Suppose we 
are given an optimal solution to the LP relaxation of our problem. Consider 
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Figure 2: A cluster. If we make sure that at least one facility is open close 
to a cluster center j' , then any other client j from the cluster may use this 
facility. Because the connection costs are assumed to be metric, the distance 
to this facility is at most the length of the shortest path from j to the open 
facility. 

the bipartite graph G = ((V' ,V"), E) with vertices V being the facilities 
and V" the clients of the instance, and where there is an edge between a 
facility i 6 7' and a client j € V" if the corresponding variable x%j in the 
optimal solution to the LP relaxation is positive. We call G a support graph 
of the LP solution. If two clients are both adjacent to the same facility in 
graph G, we will say that they are neighbors in G. 

The clustering in this graph is a partitioning of clients into clusters to- 
gether with a choice of a leading client for each of the clusters. This leading 
client is called a cluster center. Additionally we require that no two cluster 
centers are neighbors in the support graph. This property helps us to open 
one of the adjacent facilities for each cluster center. For a picture of a cluster 
see Figure El 

The algorithms by Shmoys et al., Chudak and Shmoys, and by Sviridenko 
all use the following procedure to obtain the clustering: While not all the 
clients are clustered, choose greedily a new cluster center j, and build a 
cluster from j and all the neighbors of j that are not yet clustered. Obviously 
the outcome of this procedure is a proper clustering. Moreover, it has a 
desired property that clients are "close" to their cluster centers. Each of 
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the mentioned LP-rounding algorithms uses a different greedy criterion for 
choosing new cluster centers. In our algorithm we will use the clustering 
with the greedy criterion of Sviridenko [22]. Another way of clustering is 
presented in Section 7. 

2.3 Scaling and greedy augmentation 

The techniques described here are not directly used by our algorithm, but 
they help to explain why the algorithm of Chudak and Shmoys is close to 
optimal. We will discuss how scaling facility opening costs before running 
an algorithm, together with another technique, called greedy augmentation, 
may help to balance the analysis of an approximation algorithm for the UFL 
problem. 

The greedy augmentation technique introduced by Guha and Khuller [13] 
(see also [3 [8]) is as follows. Consider an instance of the metric UFL problem 
and a feasible solution. For each facility i G J- that is not opened in this 
solution, we may compute the amount of cost that is saved by opening 
facility i, also called the gain of opening i, denoted by gi. While there exists 
a facility i with positive gain g^, the greedy augmentation procedure opens 
a facility that maximizes the ratio of gain to the facility opening cost 

Ji 

and updates the remaining values of gi. 

Suppose we are given an approximation algorithm A for the metric UFL 
problem and a real number <5 > 1. Consider the following algorithm S$(A). 

1. scale up all facility opening costs by a factor 5; 

2. run algorithm A on the modified instance; 

3. scale back the opening costs; 

4. run the greedy augmentation procedure. 

Following the analysis of Mahdian, Ye, and Zhang [18] one may prove 
the following lemma. 

Lemma 2.1 Suppose A is a (\f ,\ c )- approximation algorithm for the metric 
UFL problem, then Sg(A) is a (Xf + ln(5),l + Ac g~ 1 ) -approximation algorithm 
for this problem. 

This method may be applied to balance an (Aj,A c )-approximation algo- 
rithm with Xf « X c . However, our 1.5-approximation algorithm is bal- 
anced differently. It is a composition of two algorithms that have opposite 
imbalances. 
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3 Sparsening the graph of the fractional solution 



In this section we describe a technique that we use to control the expected 
connection cost of the obtained integer solution. Our technique is based on 
the concept of filtering, introduced by Lin and Vitter [T7], see Section [1,21 
We will give an alternative analysis of the effect of filtering on a fractional 
solution to the LP relaxation of the UFL problem. 

Suppose that, for a given UFL instance, we have solved its LP relaxation, 
and that the optimal primal solution is (x* , y* ) and the corresponding op- 
timal dual solution is (v*,w*). Such a fractional solution has facility cost 
F* = fiVi an d connection cost C* = YlieF jec c ij x *ij- Each client j has 

its share v* of the total cost. This cost may again be divided into a client's 
fractional connection cost = YlieT c v x *ij-> anc ^ ^ s fractional facility cost 
F* = v*-C*. 

3.1 Motivation and intuition 

The idea behind the sparsening technique is to make use of irregularities of 
an instance if they occur. We call an instance locally regular around client 
j if the facilities that serve j in the fractional solution (x* , y* ) are all at 
the same distance from j. An instance which is locally regular around each 
client is called regular. We begin by observing that for such an instance the 
algorithm of Chudak and Shmoys produces a solution whose cost is bounded 
by F* + (1 + -)C*, which is an easy consequence of the original analysis [10j . 
but also follows from our analysis in Section [H Although this observation 
might not be very powerful itself, the value F* + (1 + |)C* happens to be 
the intersection point between the bifactor approximation lower bound curve 
(A/, l + 2e~ A /) and the y-axis in Figured) Moreover, for regular instances we 
may apply the technique described in Section [2T3l to obtain an approximation 
algorithm corresponding to any single point on this curve. In particular, we 
may simply use this construction to get an optimal 1.463 . . .-approximation 
algorithm for regular instances of the metric UFL problem. Note, that the 
proof of the matching hardness of approximation also uses instances that 
are essentially regular. 

The instances that are not regular are called irregular and these are the 

These instances come from a reduction from the SET COVER problem. Clients 
represent elements to be covered, and facilities represent subsets. The distance Cij equals 
1 if subset i contains element j and it equals 3 otherwise. To formally argue about the 
regularity of such an instance we would need to construct an optimal fractional solution 
using only facilities at distance 1. 
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instances for which it is more difficult to create a feasible integer solution 
with good bounds on the connection cost. In fractional solutions of irregular 
instances there exist clients that are fractionally served by facilities at dif- 
ferent distances. Our approach is to divide facilities serving a client into two 
groups, namely close and distant facilities. We will remove links to distant 
facilities before the clustering step, so that if there are irregularities, then 
distances to cluster centers will decrease. 

We measure the local irregularity of an instance by comparing the frac- 
tional connection cost of a client to the average distance to its distant fa- 
cilities. In the case of a regular instance, the sparsening technique gives 
the same results as the technique described in section 12.31 but for irregular 
instances sparsening makes it possible to construct an integer solution with 
a better bound on the connection costs. 

3.2 Details 

We will start by modifying the optimal fractional LP-solution (x*,y*) by 
scaling the y-variables by a constant 7 > 1 to obtain a fractional solution 
(x*,y), where y = 7 • y*. Note that by scaling we might set some > 1. 
In the filtering of Shmoys et al. such a variable would instantly be rounded 
to 1. However, for the compactness of a later part of our analysis it is 
important not to round these variables, but rather to split facilities. Before 
we discuss splitting, let us fist modify the connection variables. A version 
of this argument, which describes all these modifications of the fractional 
solution at once, is given in [22] [Lemma 1]. 

Suppose that the values of the y- variables are scaled and fixed, but that 
we now have the freedom to change the values of the x- variables in order to 
minimize the connection cost. For each client j we compute the values of 
the corresponding x-variables in the following way. We choose an ordering 
of facilities with nondecreasing distances to client j. We connect client j 
to the first facilities in the ordering so that among the facilities fractionally 
serving j, only the last one in the chosen ordering may be opened by more 
than that it serves j. Formally, for any facilities i and %' such that i' is later 
in the ordering, if X{j < yi then x^j = 0. 

In the next step, we eliminate the occurrences of situations where < 
%ij < Vi- We do so by creating an equivalent instance of the UFL problem, 
where facility i is split into two identical facilities i' and i" . In the new 
setting, the opening of facility i' is Xij and the opening of facility i" is 
■jji—Xij. The values of the x-variables are updated accordingly. By repeatedly 
applying this procedure we obtain a so-called complete solution (x,y), i.e., 
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a solution in which no pair i G J-,j G C exists such that < xij < y i 
(sec [22 [Lemma 1] for a more detailed argument). 

In the new complete solution (x, y) we distinguish groups of facilities 
that are especially important for a particular client. For a client j we say 
that a facility i is one of its close facilities if it fractionally serves client j 
in (x,y); Cj = {i G !F\xij > 0} is the set of close facilities of j. If Xij = 0, 
but facility i was serving client j in solution (x*, y*), then we say, that i is a 
distant facility of client j; Vj = {i G F\~Xij = 0, x*j > 0} is the set of distant 
facilities of j. 

We will extensively use the average distances between single clients and 
groups of facilities defined as follows. 

Definition For any client j G C, and for any subset of facilities T' C T 
such that Yli^F' Vi > 0' ^ 

To interpret differences between certain average distances we will use the 
following parameter. 



Definition Let 



for F* = 0. 



The value r 7 (j) is a measure of the irregularity of the instance around 
client j. It is the average distance to a distant facility minus the fractional 
connection cost C* (note, that C* = d(j,T>j U Cj) is the general average 
distance to both close and distant facilities) divided by the fractional facility 
cost of a client j; or it is equal to if F* = 0. Since d(j,Vj) < v*, C* = 
d(j,VjUCj) and C* + FJ = v*, r 7 (j) takes values between and 1. r 7 (j) = 
means that client j is served in the solution (x*,y*) by facilities that are 
all at the same distance. If r 7 (j) = 1, then the facilities are at different 
distances and the distant facilities are all so far from j that j is not willing 
to contribute to their opening. In fact, for clients j with F* = the value 
of r 7 (j) is not relevant for our analysis. 

Consider yet another quantity, namely r 7 (j) = r 7 (j) * (7 — 1). Observe, 
that for a client j with F? > we have 

dij.-n.-Cji dij.Cj) 
1 f\j > p* 
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Figure 3: Distances to facilities serving client j; the width of a rectangle 
corresponding to facility i is equal to x*j. The figure explains the meaning 
of r 7 (j) and r'^j). 

We may use the definitions of r 7 (j) and r'^(j) together with Cj = 
d(j, T>j U C c ) to rewrite some distances from client j in the following form 
(see also Figure [3]) : 

• the average distance to a close facility is 

Dg(j) = d(j,C j ) = C*-r'^j)-F*, 

• the average distance to a distant facility is 

DZ(j) = d(j,V J ) = C* + r,(j)-F*, 

• the maximal distance to a close facility is 

Dg ax (j) < D°(j) = C* + r 7 (j) • F*. 
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In the following lemma we will prove an upper bound on the average 
distance from client j to another group of facilities. 

Lemma 3.1 Suppose 7 < 2 and that clients j,j' £ C are neighbors in (x,y), 
i.e. 3i € T s.t. Xij > and Xiji > 0. Then, either Cji \ (Cj U T>j) = or 

d(j,C f \ (Cj U Vj)) < DZU) + Dg ax (f) + Dg(j'). 

Proof Assume that Cj> \ (CjWDj) is not empty, since otherwise we are done. 

Case 1. Assume that the distance between j and f is at most D^ v (j) + 
Dav(j')- By a simple observation, that a maximum is larger that the average, 
we get 

Combining the assumption with ([3]), we obtain 



dU'^MCjUV^KDg^f). (3) 



d(j,Cf \ {C 3 U V 3 )) < Dg(j) + Dg ax (f) + Dg(j') . 

Case 2. Assume that the distance between j and j' is longer than D® v (j) + 
Dav(j') • Since d(j,Cj PI Cj>) < D® v (j), the assumption implies 

d(f,c j nc j ,)>D$ tf). (4) 

Consider the following two sub-cases. 

Case 2a. Assume that d(f,Cj> n V 3 ) > D% v (j'). 
This assumption together with (jl]) gives 

dU'tCynfaUVMZDgtf). (5) 

Recall that Dg(j') = d(j',Cji). Hence ([5]) is equivalent to 

dtftCyMCjUVjVKDgtf). (6) 

Since j and f are neighbors, the distance between them is at most -Dmax(i) + 
Dmaxti')- By the triangle inequality ([!]) we may add this distance to ([6]) 
and get 

d(j,Cf \ iCj U Vj)) < Dg(y) + Dg ax (j') + Dg(f) . 
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Case 2b. In the remaining case we assume that d(f ,Cj> C\T>j) < Dg(j'). 
This assumption may also be written as 

d(j',C f r\Vj) = Dg(f) - z for some z > 0. (7) 

Now we combine ([7D with the assumption of Case 2 to get 

dU,C r nVj) >D°(j) + z. (8) 

Let y = J2ie(C -,nVj) Vi ^ e the total fractional opening of facilities in Cj> HDj 
in the modified fractional solution (x,y). 

Observe that (JSj) together with the definition d(j,T>j) = D® v (j) implies 
that the set (T>j \ Cji) is not empty. Moreover it contains facilities whose 
opening variables y sum up to 7 — 1 — y > 0. More precisely, inequality ([8]) 
implies d(j, Vj \ C r ) < D®(j) - z ■ ^rfrg • Hence 

C (j) < Dav <J)-z 7-^ • (9) 

7- 1 - y 

We combine ([9]) with the assumption of Case 2 to conclude that the minimal 
distance from j' to a facility in CyV\Cj is at least D^ v (j)+Dg(j') — Dg ax (j) — 
Dg(f) + z-^. Hence 



dCi',^-/ n Cj) > Dgtf) + z ■ (10) 

Recall that, by definition, d(f ,Cj>)Dg(j'). Hence equality ([7]) may be writ- 
ten as 

y 

Since, by the assumption that 7 < 2, we have < 7 _i_^ , we may also 
write 

dC^fy \ < ngtf) + z • (12) 

We may now combine ()12p with (|10p to get 

d(/,C,v \ u C,)) < + z • j[_ (13) 

Finally, we bound the distance form j to j' by + ^m,ax{j') to get 

d(j, \ (Cj U 2?,)) < Dg ax (j) + + C f \ Pi u C i)) 

< Dg(j) - z ■ ^ + Dg ax (f) + Dg(f) + z • ^ 
= J D£(j) + J D^(/)+ J D^(/), 

where the second inequality is an application of f|13f) and ([9]). | 



d(j', C,v \ V S ) = Dg(j') + 2 • (11) 
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4 Our new algorithm 



Here we again state our algorithm (cf. Section ll.3p . but now we use the 
notation developed in the previous sections. 

Algorithm Al('y): 

1. Solve the LP relaxation of the problem to obtain a solution (x*,y*). 

2. Modify the fractional solution as described in Section [3.21 to obtain a 
complete solution (x, y). 

3. Compute a greedy clustering for the solution (x,y), choosing as cluster 
centers unclustered clients minimizing D^ v {j) + D^^j). 

4. For every cluster center j, open one of its close facilities randomly with 
probabilities x~ij. 

5. For each facility i that is not a close facility of any cluster center, open 
it independently with probability y^ 

6. Connect each client to an open facility that is closest to it. 

Consider the binary vector y E {0, l}'^' encoding the facilities opened 
in Steps 4 and 5 of Algorithm ^1(7). With the following lemma we give 
an upper bound on the expected distance from a client to the closest of the 
facilities opened by the algorithm within a certain subset of facilities. 

Lemma 4.1 Given are a random vector y € {0, l}^ produced by Algorithm 
^1(7), a subset A C T of 'facilities such that YlieA Vi > 0; an d a client j £ C. 
Then, the following holds: 



Proof Observe, that the opening of facilities from A is either pairwise in- 
dependent, or there exist disjoint subsets Ai, A2, ■ ■ ■ C A, which correspond 
to clusters created in Step 3 of the algorithm, such that the opening of fa- 
cilities in each A^ is negatively correlated but facilities from different sets 
are uncorrelated. The correlation in these subsets is a result of Step [4] of 
the algorithm. In each such A^, there is at most 1 facility opened, and the 
probability that one is opened equals ^ieA fe yi- Therefore, for the purpose 
of this proof, we may replace each Ak by a new facility ik with distance to j 
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equal d(j, Au) and fractional opening y ik = YlieA k Vi- After this replacement 
for each A^, we have a set of facilities that are opened independently. 

Consider the facilities from A in the order *i, «2 5 - - - of nondecreasing 
distance from j. Since their opening is independent, the probability that i\ 
counts as closest among the open facilities is 



Pi = Pr[yh = 0] • Pr[y i2 

= (l-y H )(l-y l2 ).. 



0] • • • • • Priyi^ 



(i-i) 



0] • Pr[y i; = 1] 



The expected distance may be bounded as: 



E 



mm 

i&A,yi=l 



^3 



E^ 1 

i&A 



\A\ 

= J2Pl c ii3 
1=1 

\A\ 



< 



1=1 

{K=\^-W)^n,3 

Zi^n^l-yj)^ 
\A A \ - 

i^i=i iJii 

^ieA Vii C H,3 



d(j,A). 



The second equality comes from the fact that, under the condition that 
J2ieAVi > 1' the sum of probabilities YIpi equals 1. The inequality is a 
comparison of weighted arithmetical averages, where the first one has lower 
weights for bigger elements. || 

In the analysis of our algorithm we will also use the following result: 

Lemma 4.2 Given are n independent events that occur with probabilities 
Pi,P2, ■ ■ ■ ,Pn respectively. The probability that at least one of these events 
occurs is at least equal to 1 — ^n 1 - , where e denotes the base of the natural 
logarithm. 



Let 70 be defined as the only positive solution to the following equation. 

(to - i) • (i - - + 4r) = ( 14 ) 



l l 

e + eTo 
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q 



o 



cluster center j ' 



■close facilities of j' 



- /" close facilities of j 
distant facilities of j 



Figure 4: Facilities that client j may consider: its close facilities, distant 
facilities, and close facilities of cluster center j' . 

An approximate value of this constant is 70 ~ 1.67736. As we will observe 
in the proof of Theorem 14.31 equation (|14p appears naturally in the analysis 
of algorithm Al(^y). 

Theorem 4.3 Algorithm ^41 (70) produces a solution with expected cost 



To bound the expected connection cost we show that for each client j 
there is an open facility within a certain distance with a certain probability. 
If j is a cluster center, one of its close facilities is open and the expected 
distance to this open facility is D^ ve (j) = C* — r!y(j) ■ F* < C*. 

If j is not a cluster center, it first considers its close facilities (see Fig- 
ure H|) . If any of them is open, by Lemma 14.11 the expected distance to the 
closest open facility is at most D^ v (j). From Lemma 14.2} at least one close 
facility is open with probability p c > (1 — =). 

Suppose none of the close facilities of j is open, but at least one of its 
distant facilities is open. Let p^ denote the probability of this event. Again 
by Lemma 14.11 the expected distance to the closest facility is then at most 



If neither any close nor any distant facility of client j is open, then j 
may connect itself to the facility serving its cluster center j' . Again from 
Lemma 14.21 such an event happens with probability p s < —. We will 
now use the fact that if 7 < 2 then, by Lemma 13.11 and Lemma 14.11 the 





0% 
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expected distance from j to the facility opened around f is at most D^ v (j) + 
B c (j') + D c (j'). 

max\J I 1 av\J I 

Finally, we combine the probabilities of particular cases with the bounds 
on the expected connection for each of the cases to obtain the following 
upper bound on the expected total connection cost. 

E[C SO l] < Ejee {Pc • Dg(j) + p d ■ Dg(j) + p s ■ (Dg(j) + Dg ax (j') + Dg(j'))) 
< Ejec (iPc + Ps) • Dg(j) + {p d + 2p s ) ■ Dg(j)) 
= Ejec ((Pc+Ps) ■ (C* - r' 7 (j) ■ Fj) + ( Pd + 2 Ps ) ■ (C* + r 7 (j) • F*)) 

= {(Pc+Pd+Ps) + 2p.s)-C* 

+ Ejee ((Pc + Ps) ■ (-rj(j) ■ (7 - 1) " F*) + (p d + 2p s ) ■ (r 7 (j) • F/)) 
= (1 + 2p s ) ■ C* + E je c ( F j ■ r i(j) ■ iPd + 2p s - (7 - 1) • (Pc + Ps))) 

<(i + ^)-c* + E i6C (i? "-7C7) + (7- + 

In the above calculation we used the following properties. In the first in- 
equality we explored the fact that cluster centers were chosen greedily, which 
implies Dg ax (f) + Dg(j') < D^ ax (j) + Dg(j). For the last inequality, we 
used p d + 2p s = 1 - p c + p s < 1 - (1 - i) + i = \ + i. 

It remains to observe that by setting 7 = 70 ~ 1.67736 (see CEH)) 
we eliminate the last term in the connection cost bound, and we obtain 
E[C SO l\ < (1 + Jo) • C* < 1.37374 • C*. | 

The algorithm Al(7o) was described as a procedure of rounding a partic- 
ular fractional solution to the LP relaxation of the problem. In the presented 
analysis we compared the cost of the obtained solution with the cost of the 
starting fractional solution. If we appropriately scale the cost function in 
the LP relaxation before solving the relaxation, we easily obtain an algo- 
rithm with a bifactor approximation guaranty in a stronger sense. Namely, 
we get a comparison of the produced solution with any feasible solution to 
the LP relaxation of the problem. Such a stronger guarantee is, however, 
not necessary to construct the 1.5- approximation algorithm for the metric 
UFL problem, which is presented in the next section. 

The algorithm Al(7) with 7 = 1 + e (for a sufficiently small positive e) is 
essentially the algorithm of Chudak and Shmoys. Observe that for regular 
instances, namely those with r 7 (j) = for every client j, we do not need to 
set 7 = 70 to eliminate the dependence of connection cost of the produced 
solution on the facility opening cost of the fractional solution. Hence, for 
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Performance ofouralgorithm- bif actor analysis 



t 



regular instances 

— — - extremely irregular inst. . 




1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 



Figure 5: The performance of our algorithm for different values of parameter 
7. The solid line corresponds to regular instances with r 7 (j) = for all j and 
it coincides with the approximability lower bound curve. The dashed line 
corresponds to instances with r 7 (j) = 1 for all j. For a particular choice of 
7 we get a horizontal segment connecting those two curves; for 7 ~ 1.67736 
the segment becomes a single point. Observe that for instances dominated 
by connection cost only a regular instance may be tight for the lower bound. 

regular instances, we get a (7, )-approximation algorithm for each choice 
of 7 > 1. 



facility cost coefficient 
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5 The 1.5-approximation algorithm 



In this section we will combine our algorithm with an earlier algorithm of 
Jain et al. to obtain a 1.5-approximation algorithm for the metric UFL 
problem. 

In 2002 Jain, Mahdian and Saberi [16] proposed a primal-dual approxi- 
mation algorithm (the JMS algorithm). Using a dual fitting approach they 
showed that it is a 1.61-approximation algorithm. Later Mahdian, Ye and 
Zhang [18] derived the following result. 

Lemma 5.1 (|18j) The cost of a solution produced by the JMS algorithm is 
at most 1.11 x F* + 1.7764 x C* , where F* andC* are facility and connection 
costs in an optimal solution to the linear relaxation of the problem. 

Theorem 5.2 Consider the solutions obtained with the A1(jq) and JMS 
algorithms. The cheaper of them is expected to have a cost at most 1.5 times 
the cost of the optimal fractional solution. 

Proof Consider an algorithm A2 that does the following. With probability 
p = 0.313 runs the JMS algorithm and otherwise, with probability 1 — p, 
runs the Al(7o) algorithm. Suppose that we are given an instance, and that 
F* and C* are facility and connection costs in an optimal solution to the 
linear relaxation of this instance. Consider the expected cost of the solution 
produced by algorithm A2 for this instance. E[cost] < p- (1.11 • F* + 1.7764 • 
C*) + (1 - p) ■ (1.67736 • F* + 1.37374 • C*) = 1.4998 • F* + 1.4998 • C* < 
1.5 * (F* + C*) < 1.5 * OPT. | 

Instead of the JMS algorithm we could take the algorithm of Mahdian et 
al. [IB], the MYZ(<5) algorithm, that scales the facility costs by 5, runs the 
JMS algorithms, scales back the facility costs and finally runs the greedy 
augmentation procedure. With the notation introduced in Section 12.31 the 
MYZ(<5) algorithm is the S S (JMS) algorithm. The MYZ(1.504) algorithm 
was proven [18] to be a 1.52- approximation algorithm for the metric UFL 
problem. We may change the value of 5 in the original analysis to ob- 
serve that MYZ(l.l) is a (1.2053,1.7058)-approximation algorithm. This 
algorithm combined with our Al(7o) (1. 67736, 1.37374)-approximation al- 
gorithm gives a 1.4991-approximation algorithm for UFL. This shows how 
much improvement we obtain by using the scaling technique on the greedy 
algorithm's side. 
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6 Multilevel facility location 



In the fc-level facility location problem the clients need to be connected to 
open facilities on the first level, and each open facility except on the last, k- 
th level, needs to be connected to an open facility on the next level. Aardal, 
Chudak, and Shmoys [1] gave a 3-approximation algorithm for the k-level 
problem with arbitrary k. Ageev, Ye, and Zhang [3J proposed a reduction 
of a fc-level problem to a (/c — l)-level and a 1-level problem, which results in 
a recursive algorithm. This algorithm uses an approximation algorithm for 
the single level problem and has a better approximation ratio, but only for 
instances with small k. Using our new algorithm Al(7o) instead of the JMS 
algorithm within this framework, improves approximation for each level. In 
particular, in the limit as k tends to oo, we get a 3.236-approximation which 
is the best possible for this construction. 

By a slightly different method, Zhang [24J obtained a 1.77-approximation 
algorithm for the 2-level problem. For the 3-level and the 4-level version 
of the problem he obtained 2.523-H and 2.81-approximation algorithms, by 
reducing to a problem with smaller number of levels. In the following section 
we will modify the algorithm by Zhang for the 3-level problem, and use 
the new (1. 67736, 1.37374)-approximation algorithm for the single-level part, 
to obtain a 2.492-approximation, which improves on the previously best 
known approximation by Zhang. Note, that for k > 4 the best known 
approximation factor is still due to Aardal et al. pQ. 

6.1 3-level facility location 

We will now present the ingredients of the 2.492-approximation algorithm. 
We start from an algorithm to solve the 2-level version. 

Lemma 6.1 (Theorem 2 in |24j) The 2-level UFL problem may be ap- 
proximated by a factor of 1.77 + e in polynomial time for any given constant 
e > 0. 

Zhang [23] also considered a scaling technique analogous to the one de- 
scribed in Section 12.31 but applicable to the 2-level version of the problem. 
An effect of using this technique is analyzed in the following lemma. 

Lemma 6.2 (Theorem 3 in |24j) For any given e > 0, if there is an 
(a, b) -approximation algorithm for the 2-level UFL problem, then we can get 

2 This value deviates slightly from the value 2.51 given in the paper. The original 
argument contained a minor calculation error. 
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an approximation algorithm for the 2-level UFL problem with performance 
guarantee 

(a + -l^Zn(A) + e, 1 + 

for any A > 1. 

He also uses the following reduction. 

Lemma 6.3 (Lemma 7 in |24|) Assume, that the 1-level and 2-level UFL 
problems have approximation algorithms with factors (a,b) and (a, (3), re- 
spectively, then the 3-level UFL problem may be approximated by factors 
{max{a,^},^). 

Zhang [21] observed that the above three statements may be combined 
with the MYZ algorithm to improve the approximation ratio for the 3-level 
UFL problem. In the following theorem we show that we may use our new 
(1.6774,1.3738)-approximation algorithm for the 1-level UFL problem to get 
even better approximation for the 3-level variant. 

Theorem 6.4 There is a 2.^92- approximation algorithm for the 3-level UFL 
problem. 

Proof We first use the algorithm from Lemma 16.11 and the scaling tech- 
nique from Lemma with A = 1.57971, to obtain a (2.492,1.48743)- 
approximation algorithm for the 2-level UFL problem. 

Then we use our (1.6774, 1.3737. . ^-approximation algorithm for the 1- 
level UFL problem with the scaling technique from Lemma 12.11 with 7 = 
2.25827, to obtain a (2.492, 1.1655)-approximation algorithm for the 1-level 
UFL problem. 

Finally, we use Lemma 16.31 to combine these two algorithms into a 
(2.492, 2.492)-approximation algorithm for the 3-level UFL problem. £ 

7 Universal randomized clustering procedure 

In this section we discuss a different approach to clustering. We propose 
to modify the greedy clustering algorithm by choosing consecutive cluster 
centers randomly with uniform distribution. The output of such a process is 
obviously random, but we may still prove some statements about probabil- 
ities. A resulting clustering will be denoted by a function g : C — > C, which 
assigns to each client j the center of its cluster j' = g(j). The following 
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lemma states that the clustering g obtained with the randomized clustering 
procedure is expected to be "fair" . 

Lemma 7.1 Given a graph G = (F\JC,E) and assuming that a clustering 
g was obtained by the above described random process, for every two distinct 
clients j and j' , the probability that g(j) = j' is equal the probability that 
g(f) =3- 

Proof Let C{G) denote the maximal (over the possible random choices of 
the algorithm) number of clusters that can be obtained from G with the 
random clustering procedure. The proof will be by induction on C(G). 
Fix any j,f G C such that j is a neighbor of j' in G (if they are not 
neighbors, neither g(j) = j' nor g(j') = j can occur). Suppose C{G) = 1, 
then Pr[g(j) = j'} = Pr[g(f) = j] = 1/\C\. 

Let us now assume that C(G) > 1. There are two possibilities, either one 
of j, j' will belong to the first cluster, or none of them will. Consider the first 
case (the first chosen cluster center is either j or j' or one of their neighbors) . 
If j (j') is chosen as a cluster center, then g(j') = j (g(j) = j')- Since they 
are chosen with the same probability, the contribution of the first case to 
the probability of g(j') = j is equal to the contribution to the probability of 
g(j) = j'. If neither of them gets chosen as a cluster center but at least one 
belongs to the new cluster, then neither g(j') = j nor g(j) = j' is possible. 

Now consider the second case (neither j nor j' belongs to the first clus- 
ter). Consider the graph G' obtained from G by removing the first cluster. 
The random clustering proceeds like it has just started with the graph G' , 
but the maximal number of possible clusters is smaller: C(G') < C(G) — 1. 
Therefore, by the inductive hypothesis, in a random clustering of G' the 
probability that g(j') = j is equal to the probability that g{j) = j'. §j 

If g(j) = j' m a clustering g of graph G we will say that client j' offers 
support to client j. The main idea behind the clustering algorithms for the 
UFL problem is that we may afford to serve each cluster center directly 
(because they are never neighbors in G) and all the other clients are offered 
support from their cluster centers. A non-central client may either accept 
the support and connect itself via its cluster center (that is what all non- 
central clients do in the algorithm of Shmoys et al.), or it may try to get 
served locally, and if it fails, accept the support (this is the way the Chudak 
and Shmoys' algorithm works). In both those algorithms the probability 
that an offer of support is accepted is estimated to be constant. Therefore, 
we may modify those algorithms to use the random clustering procedure 
and do the following analysis. 
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For any two clients j and j', the probability that j accepts the support 
of j' is equal to the probability that f accepts the support of j. Let % be 
a facility on a shortest path from j to j' . When we compute the expected 
connection cost of client j, we observe that with certain probability p it 
accepts the support of j'. In such a case it must pay for the route via i and 
j' to the facility directly serving j' . We will now change the bookkeeping 
and say that in this situation j is paying only for the part until facility i, 
and the rest is paid by j', but if j would be supporting j' it would have to 
pay a part of j h s connection cost, which is the length of the path from i 
via j to the facility serving j. We may think of this as each client having 
a bank account, and when it accepts support it makes a deposit, and when 
it offers support and the support is accepted, then it withdraws money to 
pay a part of the connection cost of the supported client. From Lemma 17. II 
we know that for a client j the probability that it will earn on f is equal to 
the probability that it will lose on j' . Therefore, if the deposited amount is 
equal to the withdrawal, the expected net cash flow is zero. 

The above analysis shows that randomizing the clustering phase of the 
known LP-rounding algorithms would not worsen their approximation ra- 
tios. Although it does not make much sense to use a randomized algorithm 
if it has no better performance guarantee, the random clustering has an 
advantage of allowing the analysis to be more local and uniform. 

8 Concluding remarks 

With the 1.52-approximation algorithm of Mahdian et al. it was not clear 
to the authors if a better analysis of the algorithm could close the gap 
with the approximation lower bound of 1.463 by Guha and Khuller. In [6] 
we have recently given a negative answer to this question by constructing 
instances that are hard for the MYZ algorithm. Similarly, we now do not 
know if our new algorithm Al^) could be analyzed better to close the gap. 
Construction of hard instances for our algorithm remains an open problem. 

The technique described in Section 12.31 enables us to move the bifactor 
approximation guarantee of an algorithm along the approximability lower 
bound of Jain et al. (see Figure [T|) towards higher facility opening costs. 
If we developed a technique to move the analysis in the opposite direction, 
together with our new algorithm, it would imply closing the approximability 
gap for the metric UFL problem. It seems that with such an approach we 
would have to face the difficulty of analyzing an algorithm that closes some 
of the previously opened facilities. 
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